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Abstract 



We consider the problem of recovering items matching a partially specified pattern in multidimen- 
sional trees (quad trees and k-d trees). We assume the classical model where the data consist of indepen- 
dent and uniform points in the unit square. For this model, in a structure on n points, it is known that the 
complexity, measured as the number of nodes Cn{C) to visit in order to report the items matching a ran- 
dom query ^, independent and uniformly distributed on [0, 1], satisfies E[Cn(0] ^ i^n^ , where k and /3 
are explicit constants. We develop an approach based on the analysis of the cost Cn{s) of any fixed query 
s G [0, 1], and give precise estimates for the variance and limit distribution. Moreover, a functional limit 
law for a rescaled version of the process (Cn(s))o<s<i is derived in the space of cadlag functions with 
the Skorokhod topology. For the worst case complexity maxsg[o,i] Cn{s) the order of the expectation as 
well as a limit law are given. 

AMS 2010 subject classifications. Primary 60F17, 68Q25; secondary 60C05, 60G18. 

Key words, search trees, partial match query, functional limit theorem, recursive distributional equation, 

analysis of algorithms, weak convergence, quadtree, k-d tree. 

1 Introduction 

In the probabilistic analysis of algorithms complexities of algorithms and data structures are analyzed as- 
suming the input data being random. We consider a fundamental search operation, so-called partial match 
queries, in data structures holding multidimensional data. 

Multidimensional databases arise in a number of contexts such as computer graphics, management of 
geographical data or statistical analysis. The question of retrieving the data matching a specified pattern 
is of prime importance. If the pattern specifies all the data fields, the query can generally be answered 
in logarithmic time, precise analyses are available in this case, see, e.g., [16, 18, 20, 24, 25]. We will be 
interested in partial match queries, i.e., the case when the pattern only constrains some of the data fields. 

The first investigations about partial match queries by Rivest [34] were based on digital structures. In 
a comparison-based setting, a few general purpose data structures generalizing binary search trees permit 
to answer partial match queries, namely the quadtree [15], the k-d tree [1] and the relaxed k-d tree [11]. 
Aside of the interest that one might have in partial match for itself, there are numerous reasons that justify 
the precise quantification of the cost of such general search queries in comparison-based data structures. 
The high dimesional trees are indeed a data structure of choice for applications that range from collision 
detection in motion planning to mesh generation that takes advantage of the adaptive partition of space that 
is produced [22, 41]. For general references on multidimensional data structures and more details about 
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their various applications, see the series of monographs by Samet [38, 39, 40]. The cost of partial match 
queries also appears in (hence influences) the complexity of a number of other geometrical search questions 
such as range search [10] or rank selection [12]. 

For the analysis of the complexity of partial match queries usually probabilistic models for the data 
and the query have been assumed and mainly the asymptotic behavior of the expected complexity has been 
investigated. In this paper, we provide refined analyses of the costs of partial match queries in some of the 
most important two dimensional data structures. We focus on the cases of quadtrees and k-d trees. 

Quad trees and multidimensional search. The quadtree [15] allows to manage multidimensional 
data by extending the divide-and-conquer approach of the binary search tree. Consider the point sequence 
Pi,P2, • • • ,Pn ^ [O5 1]^- As we build the tree, regions of the unit square are associated to the nodes where 
the points are stored. Initially, the root is associated with the region [0, 1]^ and the data structure is empty. 
The first point pi is stored at the root, and divides the unit square into four regions Qi, . . . , Q4. Each region 
is assigned to a child of the root. More generally, when i points have already been inserted, we have a set 
of 1 + 3z (lower-level) regions that cover the unit square. The point p^+i is stored in the node (say u) that 
corresponds to the region it falls in, divides it into four new regions that are assigned to the children of u. 
See Figure 1 . 
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Figure 1 : An example of a (point) quadtree: on the left the partition of the unit square induced by the tree data structure 
on the right (the children are ordered according to the numbering of the regions on the left). Answering the partial match 
query materialized by the dashed line on the left requires to visit the points/nodes coloured in red. Note that each one 
of the visited nodes correspond to a horizontal line that is crossed by the query. 

Analysis of partial match retrieval. For the analysis, we wiU focus on the model of random 
quadtrees, where the data points are independent and uniformly distributed in the unit square. In the present 
case, the data are just points, and the problem of partial match retrieval consists in reporting all the data with 
one of the coordinates (say the first) being s G [0, 1]. It is a simple observation that the number of nodes 
of the tree visited when performing the search is precisely Cn(s), the number of regions in the quadtree 
that insersect a vertical line at s. The first analysis of partial match in quadtrees is due to Flajolet et al. [19] 
(after the pioneering work of Flajolet and Puech [17] in the case of k-& trees). They studied the singularities 
of a differential system for the generating functions of partial match cost to prove that, for a random query 
^, being independent of the tree and uniformly distributed on [0, 1], 



E[C„(e)] 



(1) 



where 



r(2/? + 2) 
2r(/? + i)3' 



\/T7-3 



and r(a;) denotes the Gamma function r(a;) = t^~^e~*dt. This has since been strengthened by Chem 
and Hwang [4], who provided the order of the error term (together with the values of the leading constant 
in all dimensions). The most precise result is (6.2) there, saying that 



(2) 
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To gain a refined understanding of the cost beyond the level of expectations we pursue two directions. 
First, to quantify the order of typical deviations from the mean we study the order of the variance together 
with limit distributions. However, deriving higher moments turns out to be subtle. In particular, when the 
query line is random (like in the uniform case although the four subtrees at the root are independent 
given their sizes, the contributions of the two subtrees that do hit the query line are dependent. The relative 
location of the query line inside these two subtrees is again uniform, but unfortunately it is same in both 
regions. Hence, one cannot easily setup recurrence relations and perform an asymptotic analysis exploiting 
independence. This issue has not yet been addressed appropriately, and there is currently no result on the 
variance or higher moments for (7^(0- 

The second issue lies in the definition of the cost measure: even if the data follow some distribution, 
should one assume that the query is uniformly random? In other words, should we focus on Cn(0'^ Maybe 
not. But then, what distribution should one use for the query line? 

One possible approach to overcome both problems is to consider the query line to be fixed and to study 
Cn{s) for s e [0, 1]. This raises another problem: even if s is fixed at the top level, as the search is 
performed, the relative location of the queries in the recursive calls varies from one node to another. Thus, 
in following this approach, one is led to consider the entire stochastic process {Cn{s))se[o,i]'^ this is the 
method we use here. 

Recently Curien and Joseph [6] obtained some results in this direction. They proved that for every fixed 
(0,1), 

E[Cn(.)]^i^i(.(l-.))^/V, (3) 

with 

^ ^ r(2/3 + 2)r(/3 + 2) 
' 2r(/3 + i)3r(/3/2 + i)'* 

On the other hand, Flajolet et al. [19, 20] prove that, along the edge one has E[Cn(0)] = 6(n^"^), so that 
E[Cn(0)] = o(n^) (see also [6]). The behavior about the x-coordinate U of the first data point certainly 
resembles that along the edge, so that one has E[Cn(/7)] = o(n^). It suggests that Cn{s) should not 
be concentrated around its mean, and that n~^Cn{s) should converge to a non-trivial random variable as 
n ^ oo. Below, we verify a functional limit law for {n~^Cn{s))se[o,i] and characterize the limit process. 
From this we obtain refined asymptotic information on the complexity of partial match queries in quadtrees. 



2 Main results and implications 

We denote by P[0, 1] the space of cadlag functions on [0, 1] and by ||/|| := sup^^[o,i] 1/(^)1 the uniform 
norm of / G P[0, 1]. Our main contribution is to prove the following convergence result: 

Theorem 1. Let Cn{s) be the cost of a partial match query at a fixed line s in a random quadtree. Then, 
there exists a random continuous function Z such that, as n ^ oo, 

(j^,s A {Z{s),s (5) 

This convergence in distribution holds in V[0,1] equipped with the Skorokhod topology. 

The convergence in (5) in particular implies the convergence in distribution of the finite dimensional 
marginals 

as n ^ oo, for any natural number k and points 5i, 52, . . . , 5/^ G [0, 1] [see, e.g., 2]. 
The limit process Z may be characterized as follows (see Figure 2 for a simulation): 
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Proposition 2. The distribution of the random function Z in (5) is a fixed point of the following functional 
recursive distributional equation, as process in s G [0,1], 



Z{s) =l{s<U} 

+ l{s>t/} 



«i - u)vfz('^ + ((1 - u)ii - v)fz('^ ^'-^ 



(6) 



where U and V are independent [0, l]-uniform random variables and Z^'^\ i = 1, . . . , 4 are independent 
copies of the process Z, which are also independent ofU and V. Furthermore, Z in (5) is the only contin- 
uous solution of (6) such that F/[Z{s)] = {s{l — s)Y^'^ for all s G [0, 1] and E[||Z|p] < oo. 

It turns out in the proofs below that the convergence that impHes Theorem 1 is strong enough to guar- 
antee convergence of the variance of the costs of partial match queries. The following theorem for uniform 
queries ^ is the direct extension of the pioneering work of Flajolet and Puech [17], Flajolet et al. [19] for 
the cost of partial match queries at a uniform line ^ in random two-dimensional trees. 

Theorem 3.1f£, is uniformly distributed on [0, 1], independent of{Cn) cind Z, then 



in distribution. Moreover, we have 



where, with Ki given in (4), 



/2(2^+l)^ 



:= Kl • Var(Z(0) = KI (^^^i^B(/3 + 1, /3 + 1)^ - B(/3/2 + 1, /3/2 + 1)^ 
^ 0.447363034. 

Here B(a, h) := t^~^{l — t)^~^dt denotes the Eulerian integral for a, 6 > —1. In particular, Theo- 
rem 3 identifies the asymptotic order of Var(Cn(0) which is to be compared with studies that neglected 
the dependence between the contributions of the subtrees mentioned above [26, 27, 29]. A refined result for 
the asymptotic order of Var(Cn(s)) is 

Var {Cn{s)) ^ Kl\3.v {Z{s)) n^^, 

where s G (0, 1) and an explicit expression for Var {Z{s))i^ given in (42). 

Another consequence of Theorem 1 concerns the order of the cost of the worst query supg^[Q Cn{s). 

Theorem 4. Let Sn = ^^Pse[o,i] Cn{s). Then, as n ^ oo, 

n-^Sn^S:= sup Z{s) 

sG[0,l] 

in distribution and with convergence of all moments. In particular, 

E[Sn] - n^E[5], Var(5n) - n^^Var(5). 

Note in particular that the asymptotic order of ^[Sn] does not include an extra logarithmic factor. 
The one-dimensional marginals of the limit process {Z{s),s G [0, 1]) are all the same up to a multi- 
plicative constant: 

Theorem 5. There is a random variable Z >0 such that for all s G [0, 1], 

Z{s) ^ {s{l - s)f/^Z. (7) 

The distribution of Z is characterized by its moments := E [Z^], m G N. They are given by ci = 1 and 
the recurrence, for m > 2, 

^"^^^ E h)^W^^hf^{m-i)^l)ceCm-e. (8) 



(m - l)(m + 1 - 3/3m/2) 
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Convergence of all moments of the supremum n in Theorem 4 implies uniform integrability of 
any moment of the process n~^Cn, hence the following result about convergence of all moments. 



Theorem 6. For all s e [0, 1], we have 



E 



for all m ^ N as n ^ oo where Cm is given in (8). The analogous result holds true for Zn{^) where ^ is 
uniform on [0, 1] and independent of {Zn)n>o (^^d Z. Moreover, for any natural number ^ > 0, positions 
< 5i < . . . < 5£ < 1, and /ci , . . . , /c£ G N one has 



E[C^ • • • (s,)] ~ {K^nP)i^U . E[Z>^^ (si) • • • Z'^' (s,)]. 



Plan of the paper. Our approach requires to work with the process {Cn{s) : s e [0, 1]) and is based 
on the recursive decomposition of the tree at the root. This yields a recursive distributional recurrence for 
{Cn{s) : 5 G [0, 1]) to which we apply a functional version of the contraction method. In Section 3, we 
give an overview of this underlying methodology. In particular, we discuss the novel results of Neininger 
and Sulzbach [32] about the contraction method in function spaces which we will apply. Sections 4 and 5 
are dedicated to the proofs of two of the main ingredients required to apply the results of [32], the existence 
of a continuous solution of the limit recursive equation, and the uniform convergence of the rescaled first 
moment n~^E[Cn(s)] at an appropriate rate . In Section 6, we identify the variance and the supremum of 
the limit process Z, and deduce the large n asymptotics for Cn{s) in Theorems 3 and 4. Finally, we prove 
analogous results for the cases of 2-d trees in Section 7. Our results on quadtrees have been announced in 
the extended abstract [3]. 



3 Contraction method in function spaces 
3.1 Overview of the method 



The aim of this section is give an overview of the method we employ to prove Theorem 1. It is based 
on a contraction argument in a certain space of probability distributions. In the context of the analysis 
of performance of algorithms, the method was first employed by Rosier [35] who proved convergence in 
distribution for the rescaled total cost of the randomized version of quicksort. The method was then further 
developed by Rosier [36], Rachev and Riischendorf [33], and later on in [9, 13, 28, 30, 31, 37] and has 
permitted numerous analyses in distribution for random discrete structures. 

So far, the method has mostly been used to analyze random variables taking real values, though a few 
applications on functions spaces have been made, see [9, 13, 21]. Here we are interested in the function 
space I^[0, 1] endowed with the Skorokhod topology [see, e.g., 2], but the main idea persists: (1) devise 
a recursive equation for the quantity of interest (here the process(Cn(s), s G [0, 1])), and (2) based on 
a properly rescaled version of the quantity come up with a limit equation, i.e., a recursive distributional 
equation that the limit may satisfy; (3) if the map of distributions associated to the limit equation is a 
contraction in a certain metric space, then a fixed point is unique and may be obtained by iteration. The 
contraction may also be exploited to obtain weak convergence to the fixed point. We now move on to the 
first step of this program. 



Write II 



in) 



(n) 

I\ ' for the number of points falling in the four regions created by the point stored at 



the root. Then, given the coordinates of the first data point (t/, F), we have, cf. Figure 1, 



(/("\ . . . , ID = Mult(n - 1; UV, U{1 - V), (1 - U){1 - V), (1 - U)V). 



(9) 



Observe that, for the cost inside a subregion, what matters is the location of the query line relative to the 
region. Thus a decomposition at the root yields the following recursive relation, for any n > 1, 



C„(s) = 1 + 1 



{s<U} 



^{s>U} 



C 



(3) 



1 



1-U 



c 



(4) 



i-u ^ 



, (10) 
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where [/, I[ 



(n) 



/j"^^ are the quantities already introduced and (C^^''), . . . , (C^^^) are independent copies 



of the sequence (C^, > 0), independent of ([/, /^"^\ . . . , /j^^). We stress that this equation does not 
only hold true pointwise for fixed s but also as cadlag functions on the unit interval. The relation in (10) is 
the fundamental equation for us. 

Letting n ^ oo (formally) in (10) suggests that, if n~^Cn{s) does converge to a random variable Z{s) 
in a sense to be precised, then the distribution of the process {Z{s)^ < 5 < 1) should satisfy the following 
fixed point equation 



Zis) ^l^^^u} [iUVrz^^^ (^) + (C/(l - V)fz('^ (^) 



l{s>(7} 



((1 - U)V fZ^^^ 



u 



u 



((1 



u 



1 - u 



(11) 



, 4 are independent 



where U and V are independent [0, l]-uniform random variables and Z^*), i = 1 
copies of the process Z, which are also independent of U and V. 

The last step leading to the fixed point equation (11) needs now to be made rigorous. It is at this point 
that the contraction method enters the game. The distribution of a solution to our fixed-point equation (11) 
lies in the set of probability measures on the Polish space (X'[0, l],d), which is the set we have to endow 
with a suitable metric. Here, d denotes the Skorokhod metric [see, e.g., 2]. 

The recursive equation (10) is an example for the following, more general setting of random additive 
recurrences: Let be V[0, l]-valued random variables with 



.5(n) 



n > 1, 



(12) 



where {Aj^\ . . . , Aj^-^) are random continuous linear operators on P[0, 1], 6*^^^ is a P[0, 1] -valued random 
variable, , . . . , /j^^ are random integers between and n — 1 and the sequences of process (Xn^^), . . . , 
(X^^^) are distributed like (X^). Moreover {A[''\ . . . , A^ , b^''^ . . . , /j^^), (xi^^), . . . , (X^^^) are 
independent. 

At this point, one should comment on the term random continuous linear operator: As explained ex- 
plicitly in [32], A is a random continuous linear operator on P[0, 1], if it takes values in the set of endo- 
morphisms on P[0, 1] that are both continuous with respect to the supremum norm and to the Skorokhod 
metric. Moreover, for any / G P[0, 1] and t G [0, 1], the quantity Af{t) has to be a real-valued random 
variable, and the same is assumed for ||A||op (see below for the definition). Finally, we remember that 
convergence <i(/n, /) in the Skorokhod metric means that there exists a sequence of monotonically 
increasing bijections (A^) on the unit interval such that fn{^n{t)) f{t) and Xn{t) t both uniformly 
in t as n ^ oo. 

To establish Theorem 1 as a special case of this setting we use Proposition 7 below. Proposition 7 is 
part of the main convergence theorem in Neininger and Sulzbach [32]. We first state conditions needed 
to deal with the general recurrence (12); we will then justify that it can indeed be used in the case of 
cost of partial match queries. Consider the following assumptions, where, for a random variable X in 
P[0, 1] we write ||X||2 := E[||X|p]V2^ foj. ^ linear operator A we write ||A||2 := E 
||A||op := supy^ii^i 11^(^)11- Suppose (X^) obeys (12) and 

(Al) Convergence and contraction. We have ||A^'^^||2, H^'^'^^lb < oo for all r = 1,. ,K and 
n > and there exist random continuous linear operators Ai, . . . , Ak on P[0, 1] and a P[0, 1]- 
valued random variable b such that, for some positive sequence R{n) | 0, as n ^ oo. 



A\\lX^' with 



K 

E 

r=l 



\\AM - Arh = 0{R{n)) 



(13) 



and for all £ G N, 



and 



E 



■*-{4"^G{0,.../}}ll^^ 



L* = lim sup E 



K 



,2 Rii^ 

'°P R{n) 



< 1. 



(14) 
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(A2) Existence and equality of moments. E[||Xn|p] < oo for all n and E[Xn^{t)] = ElXn^it)] 
for all ni,n2 G No,t G [0, 1]. 

(A3) Existence of a continuous solution. There exists a solution X of the fixed-point equation 

K 

X = ^A^(X^^)) + 6 (15) 

with continuous paths, E[||X|p] < oo and E[X(t)] = E[Xi(t)] for all t e [0, 1]. Again the random 
variables (Ai, . . . , Ak, b),X'^^\. . . , X'^^^ are independent and . . . , X^^^ are distributed like 
X. 

(A4) Perturbation condition. = Wn + hn where \\hn - h\\ ^ with e C[0, 1] and random 
variables Wn in P[0, 1] such that there exists a sequence (rn) with, as n ^ oo. 

Here, Vr^ [0, 1] C P[0, 1] denotes the set of functions on the unit interval continuous at 1, for which 
there is a decomposition of [0, 1] into intervals of length as least Tn on which they are constant. 

(A5) Rate of convergence. R{n) = o (log"^(l/rn)). 

The contraction method presented here for the space {V[0^ 1], is based on the Zolotarev metric ^2, 
see [32]. We state part of the main convergence theorem of Neininger and Sulzbach [32] that we will use. 
In the next section, we will prove our main result. Theorem 1, with the help of Proposition 7. 

Proposition 7. Let Xn fulfill ( 12). Provided that Assumptions (Al )-(A3) are satisfied, the solution X of the 
fixed-point equation (15) is unique, 
i. For all t G [0, 1], X^(t) X{t) in distribution, with convergence of the first two moments; 
a. If Z is independent of (X^), X and distributed on [0, 1] then Xn{Z) X{Z) in distribution again 

with convergence of the first two moments. 
Hi. If also (A4) and (A5) hold, then X^ X in distribution in {V[0^ 1], d). 

Note that X^ ^ X in distribution in {V[0^ 1], with X having continuous sample paths implies that 
we can find versions of (X^), X on a suitable probability space such that ||X^ — -^|| almost surely. 
However, in general we do not have X^ ^ X in distribution in P[0, 1] endowed with the uniform topology 
due to problems with measurability, see [2, Section 14] and [32, section 2.2]. 



3.2 The functional limit theorem: Proof of Theorem 1 



The aim of this section is to prove Theorem 1 with the help of Proposition 7 from Neininger and Sulzbach 
[32]. More precisely, in the following we prove all the conditions (A1)-(A5) except two which require 
much more work: the existence of a continuous solution (A3), and the uniform convergence of the mean in 
(Al) are treated separately in Sections 4 and 5, respectively. 

Following the heuristics in the introduction we scale the additive recurrence (lO)byn^. LetQoW •= 

and 

Cn{t) 



Qn{t) = 



n> 1. 



The recursive distributional equation then rewrites in terms of Qn as 



{Qn{t))te[0,l] 




(16) 
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where [/, . . . , I^^'^ are the quantities already introduced in Section 3.1 and (9) and {Qn^)n>o 



(Qn^^)n>o are independent copies of (Qn)n>o. independent of ([/, I^^^ . . . , I^^^). The convergence of 
the coefficients (/j^^/n)^ suggests that a limit of (5n(^) satisfies the fixed-point equation (11). 

The recurrence relation. Most details consists in setting the right form of the recurrence relation: for 
(A2) to be satisfied, we need to use a scaling that leads to an expectation which is independent of n. This is 
not the case for Qn{t). Denoting /in(^) = E [Cn{t)\, we are naturally led to consider lo(^) *= and 

Yn{t) = = Q^(t) - h{t) + 0{n-% n > 1. 

where the error term is deterministic and uniform mt G [0, 1]. Hence it is sufficient to prove convergence 
of the sequence {Yn)n>i- The distributional recursion in terms of Yn is 



(n) 



iXnit)) 



tG[0,l] 



= 1 



{t<u} 
+ l{^>t/} 
+ ^{t<u} 

+ l{t>c/} 



^1 
n 



Y 



(1) M 



n 



(3) 

(^) 



Y 



(2) 

(r.) 



1-/7 



r(n) X ^ 



t-U 
l-U 



ill) iii) 
(i^) +/^/(-) (i 



1 - Mnft) 



te[o,i] 



where (ri'^)n>o,...,(ri^^) n>o are independent copies of (Fn)n>o which are also independent of the vec- 
tor ([/, /^^\ . . . , /j"^^). Therefore, any possible limit Y of Yn should satisfy the following distributional 
fixed-point equation 



iY{t)) 



te[OA] 



= 1 



{t<u} 
+ l{t>c/} 



"((1 - U)VrY(^^ + ((1 - U){1 - V) fY(^^ 



t-U 



{{uvr+{u{i-v)f)] 



(17) 



te[o,i] 



Having Proposition 7 in mind, we define (random) operators Ai"\ r = 1, 2, 3, 4, by 



L{^<C/} 



l{^>t/} 



r(n) X /3 



n 

r(n) X (3 



f 



f 



u 



t-U 
l-U 



if r = 1,2 



if r = 3,4. 



Furthermore let b^^'^t) = Ylt=i ^^^\^) + (1 - IJinit)) / {Kin^) with 



L{t<C/} 



L{t>C/} 



4'") \i-u ) 



if r = 1,2 



if r 3,4. 



Then the finite-n version of the recurrence relation for {Yn)n>Q is precisely of the form (12). 
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We define similarly the coefficients of the limit recursive equation (17). We will then show that with 
these definitions, the assumptions (A1)-(A5) are satisfied (again, except the existence of a continuous limit 
solution and the uniform convergence for the mean treated in Section 4 and 5). The operators Ai, . . . , A4 
are defined by 



Ai(/)(t)=l|,<^l {uvffl^lj 
As{fm=l^t>u}{{l-U)vff 

and b{t) = Y^^^^^ br{t) - h{t) with 
h:,{t)=l^t>u}{{l-U)vfh 



t-u 

1-U 



t-u 



u 



A2{fm=^t<u} {U{l-V)ff(^ 
A,{f){t)=l^,yu}{{l-U){l-V)) 

b^{t)=l^t>u}{{l-U){l-V)fh 



The operators Ai, . . . , A4, A^^^ , . . . , A^^"^ are linear for each n. Moreover, they are bounded above by one 
which implies that they are norm-continuous. Their norm functions are real- valued random variables. In 
order to establish that they are indeed random continuous linear operators on (2^[0, 1] , d) it remains to check 
that they are continuous with respect to the Skorokhod topology. To this end, it is sufficient to prove that 

d{fn. f)^O^d (^lit<u}fn (^^) , l{t<u}f (l))^^ 

for any u G [0,1]. This follows easily since \\fn{^n{t)) — f{t)\\ with monotonically increas- 
ing bijections on the unit interval such that ||An(t) — t\\ implies \\l^i^^(^t)<u}fnWn{t)/u) — 
^{t<u}f{t/u)\\ ^ where /3n(t) =uXn{t/u). 

We are now ready to check that the assumptions (A1)-(A5) indeed hold, taking the results of Sections 4 
and 5 for granted. 

(A3) Existence of a continuous solution. In Section 4, we construct a continuous solution Z of 
the fixed-point equation (11) with E[||Z|p] < 00 and E[Z(t)] = h{t) = {t{l - t))^/^ Hence the function 
Y{t) = Z{t) - h{t) is a continuous solution of (17) with E[r(t)] = and E[||r|p] < 00. A direct 
computation shows that E[|| A^||^p] = E[(/71/)^^] = (2/3 + 1)"^ for r = 1, . . . , 4. Observe that 



L:=^E[||A.||^p] 



(2/3 + 1)^ 



< 1. 



In particular, Y is the unique solution of (17) with E \Y{t)\ = and E[||F|p] < 00. 

(A2) Existence and equality of moments. The precise scaling we chose ensures that E[y^(^)] = 0, 
for all n > 1 and t G [0, 1]. The second moments E[||y^|p] are finite as the random variables ||y^|| are 
bounded for every fixed n. 

(Al) Convergence and contraction. It suffices to focus on the terms 



\A 



in) 



and \\b[^^-h\\2 



the remaining terms can obviously be treated in the same way. Establishing the convergence only boils 
down to veryfying that a binomial random variable Bin(n,p) is properly approximated by np. Using the 
Chernoff-Hoeffding inequality for binomials [23], one easily verifies that for every a > 0, 



E 



Bin(n,p) 



-P 



uniformly inp e [0, 1]. Thus, since — y 

Il4"^-^l||2< 



< X- 



for any x,y & [0, 1], we have 



r(") \ /? 



= 0(n-i/2). 



(18) 



(19) 



9 



By Proposition 12 we have /in(^) = Kih{t)n^ + 0{n^ ^) uniformly in t G [0, 1]. Therefore 



< 



(n)x /3 



'^{t<u}h 




{uvy 



c 



(in 



for some constant C G (0, oo). Since h is bounded, the first summand is 0{n ^/^) just Hke in (19) 
above. The second term is trivially by Cn~^. Overall, we have — 6i ||2 = 0{n~^). Hence, since the 



coefficients Ar^^ are bounded by one in the operator norm and by distributional properties of I\ 
the first two constraints in Assumption (Al) are satisfied with R{n) = Cn~^ for a suitable constant C > 0, 
and e > may still be chosen as small as we want. 

Next, we consider L* in (Al). By dominated convergence we have 



(n) 



(n) 



L* — lim sup E 



r=l 
2/3/ 



2 Rjlr'^-') 

R{n) 



= 4E [{UV)^^{UV)-'] 

^ (2/3 -£ + 1)2 ^ ^' 
for £ > sufficiently small. This completes the verification of (Al). 

(A4) Perturbation condition. Note that Qn is piecewise constant: Qn{t) = Qn{s) for all s,t if no 
x-coordinate of the first n points lies between s and t. There are n independent points, the probability that 
there exists two lying within of each other is at most n~^. So (A4) is satisfied with = n~^. 



(A5) Rate of convergence. With 



and Rn = Cn ^, we have Rn = o{log ^ n) 



o(log (l/r^)) Therefore, the condition on the rate of convergence is satisfied. 



4 The limit process 

In this section, we prove the existence of a process Z G C[0, 1], the space of continuous functions from 
[0, 1] into R, that satisfies the distributional fixed point equation (11) and whose mean matches the mean of 
the rescaled version Yn{s) of Cn{s). Figure 2 shows a simulation of the process Z. 

We construct the process Z as the point- wise limit of martingales. We then show that the convergence 
is actually almost surely uniform, which allows us to conclude that Z G C[0, 1] with probability one. 

We identify the nodes of the infinite 4-ary tree with the set of finite words on the alphabet {1, 2, 3, 4}, 

r= |J{1,2,3,4}». 

n>0 

For a node G T, we write \u\ for its depth, i.e. the distance between u and the root 0. The descendants 
of u e T correspond to all the words in T with prefix u; in particular, the children of u slyq ul, . . . , uA. 
Let {Uy^ V e T} and {Vy^v G T} be two independent families of i.i.d. [0, l]-uniform random variables. 
By Co[0, 1] we denote the set of continuous functions on the unit interval vanishing at the boundary, i.e. 
/(O) = /(I) = for / G Co[0, 1]. Define the continuous operator G : (0, 1)^ x Co[0, 1]"^ ^ Co[0, 1] by 



G(x,?/,/i,/2,/3,/4)(s) =1 



{s<x} 



+ l{s>x} 



((1 - x)yfh 



(20) 



Let h be the map defined by h{s) = {s{l - s))^/^ where 2(3 = Vl7 - 3. For every node u e T, let 
Zq = h. Then define recursively 



/yu r^(TT T/ fyul 'yu2 ryu^ iyu4:\ 



(21) 
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Finally, define = to be the value observed at the root of T when the iteration has been started 
with h in all the nodes at level n. We will see that for every s G [0, 1], the sequence n > 0) is a 

non-negative discrete time martingale; so it converges with probability one to a finite limit. 

It will be convenient to have an explicit representation for Z^. For s G [0, 1], Z^^i^s) is the sum of 
exactly 2^ terms, each one being the contribution of one of the boxes at level n that is cut by the line at s. 
Let {(5^(s), 1 < z < 2"^} be the set of rectangles at level n whose first coordinate intersect s. Suppose that 
the projection of Q^(^s) on the first coordinate yields the interval [^^, r^]. Then 



Z„(.) = ^Leb(Qr(«))''./.(^^), 



(22) 



where Leb(Qf (s)) denotes the volume of the rectangle Q"(s). The difference between Z„ and Z„_|_i only 
relies in the functions appearing the boxes Q"(s): We have 



Z„+i(s)-Z„(s) = ^Leb(Q?(s))'^ 



17 



h 



(23) 



where U[^Vl , 1 < i < 2^ are i.i.d. [0, 1] -uniform random variables. In fact, U[ and VI are some of the 
variables Uu-, Vu for nodes u at level n. Observe that, although is not a product of n independent 

terms of the form UV because of size-biasing, but /7/, V- are in fact unbiased, i.e. uniform. Let denote 
the cr-algebra generated by {UujVu \u\ < n}. Then the family {U-jV-:l<i<2^}is independent of 

So, to prove that Z^i^s) is a martingale, it suffices to prove that, for 1 < i < 2^, 



E 



G{UiVih,h,h,h) 



^ n 



h 



Since Ul^V- ^1 < i < 2^ are independent of this clearly reduces to the following lemma. 

Lemma 8. For the operator G defined in (20) and [/, V two independent [0, l]-uniform random variables, 
and any s G [0, 1], we have 

E[G{U,V,h,h,h,h){s)] =h{s). 



Proof. Since V and 1 — V have the same distribution, we have 

E [G{U, V, K K K h){s)] = 2E [l{,^u}{UVfh (^) 



-2E 



Vis>u}{{l-U)VYh[- 



u 



Similarly, since U and 1 — /7 are both uniform, we clearly have 

E [G(/7, h, h, h, h){s)] = 2f{s) + 2/(1 - s), 

where we wrote f{s) = E [l^s<u}{UV)^ h{s /U)] . To complete the proof, it suffices to compute f{s). We 
have 



f{s) = E 



4s< 



13 + 1 
2 

2 



s/3/2(l _ 5)/3/2+l 



(/3+l)(/3 + 2) 
= {l-s)h{s), 

where the last line follows since (/3 + l)(/3 + 2) =4 by definition of /3. It then follows easily that 



E[G(/7,F,/i,/i,/i,/i)(s) 
which completes the proof. 



{I - s)h{s) ^ sh{l - s) = h{s), 



□ 
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Figure 2: A simulation of the limit process using the martingale 



Our aim is now to prove the following proposition: 
Proposition 9. With probability one Zjj^ converges uniformly to some continuous limit process Z on [0, 1]. 
Assume for the moment that there exist constants a, 6 G (0, 1) and C such that 



sup \Zn+i{s) - Zn{s)\ > a"" <C.6". 
.se[o,i] J 



(24) 



Then, by the Borel-Cantelli lemma the sequences Z^ is almost surely Cauchy with respect to the supremum 
norm. Completeness of (C[0, 1], || • ||) yields the existence of a random process Z with continuous paths 
such that Zn ^ Z uniformly on [0, l].We now move on to showing that there exist constants a and b such 
that (24) is satisfied. We start by a bound for a fixed value s G [0, 1]. We will then handle the supremum 
using a sieve of the interval [0, 1] by a large enough number of deterministic points. 

Lemma 10. For every s G [0, 1], any a G (0, 1), and any integer n large enough, have the bound 

P {\Zn^i{s) - Zn{s)\ > a^) < 4(16elog(l/a))^ 

Proof. We use the representation (23). As we have already pointed out earlier (Lemma 8), for every single 
rectangle at level n, we have 



E 



G(U[,Vl,h,h,h,h) 



0. 



Since h{x) < for x G (0, 1), conditional on Z^+i — is a sum of 2^ centered, bounded 
and moreover independent terms (but not identically distributed). Moreover, conditional on the term 
corresponding to Q'^{s) in (23) is bounded by 

Leb(Qn'' • \\G{Ul, VI, h, h, h, h) - h\\ < Leb(Q?)''2||/i|| 

= LcbiQ'lf2^-^. (25) 

So when conditioning on one can bound the variations of ^„+i — Zn using the Chernoff-Hoeffding 
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inequality [23]. We have 



P(|Z„+i(s)-Z„(s)| >a") = E[P(|Z„+i(s)-Z„(s)| >a" | ^„)] 

/ ^2n 



< E 



2 exp 



EtiLeb(Q-(.)) 



2/3 



< 2exp + 2P |^^Leb(Q^(s))'^ > a^" j ; (26) 



the precise constant in the exponent in the second inequality can be taken to be one since 2/(2^ ^)^>1. 
Now, since 2/3 > 1 and all the volumes Leb((5^(s)) are at most one, we have 



<P{Wn> a^") , 



(27) 



where Wn denotes the maximum width of any of the 4^ cells at level n. Indeed, the volume occupied by all 
rectangles Q2{s),l < i <2^ together is at most that of a vertical tube of width Wn- Putting together (26) 
and (27), it follows that, 

P {\Zn^i{s) - Zn{s)\ > a^) < 2exp(-a-2") + 2P {Wn > a^") 

< 2 exp(-a-^^) + 2(16e log(l/a))^ 
<4(16elog(l/a))^, 



for all n large enough using Lemma 26 from the appendix. 



□ 



Now that we have good control on pointwise variations of Z^+i — Z^, we move on to the supremum 
on [0, 1]. Consider the set Vn of x-coordinates of the vertical boundaries of all the rectangles at level n. Let 
Ln = inf{|x — ^1 : x^y G Vn}. Suppose that I/7 is an integer. Then, we have 

sup \Zn^l{s) - Zn{s)\ < SUp SUp \Zn^l{s) - Zn{s)\ 

sG[0,l] l<i<7-^ i7^<s<(i+l)7'^ 

< sup |Z,+l(i7") - ^n(n")l + 2 sup sup \Zm{s) - Zm{t)\. 

l<i<-f-^ mG{n,n+l} |s-t|<7'^ 

We first deal with the second term, and suppose that we are on the event that Ln-\-i > (47)^. Observe 
that the sieve we used, 7^, is much finer than the shortest length of a cell at level n + 1 which is at least 
Ln-\-i. We use the representation in (22); for \t — s\ < 7^, the two collections {(5^(5), 1 < i < 2^} and 
{Q?{t)j 1 < ^ < 2^} differ at most on one cell. We obtain, for any \s — t\ < 7^, 



\Zn{s)-Zn{t)\<Y,Lch{Q^{s)Y 



n _ 



2maxLeh{Q^{s)f 



< y Leb(Qf(s))'' ■4-''" + 2maxLeb(Qns))^ 

i=l 

< swl 

Here, in second inequality follows from the fact that \h{t) — h{s)\ < \t — s\^ for any s^t e [0, 1] and the 
fact that = Ln > (47)^ . In particular, it follows by the union bound that, for any 7 G (0, 1), 



P sup \Zn^,{s)-Zn{s)\>2a^] <j-^ sup F{\Zn^,{s)-Zn{s)\>a^) 
\se[o,i] I sG[o,i] 



•P(L, <(47)") + P(l2iy^>a"). 



(28) 
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We are now ready to complete the proof of Proposition 9. From (28) and Lemma 28 from the appendix, we 
have 



P sup >2a" <4(16e7-Mog(l/a))" + 6-16"7"/2'' + (4elog(12i/"/a)//3)", 

\se[o,i] J 

for all 7 < 7o and n > 710(7). Now let 7 = 7(a) be defined by 67"^ log(l/a) = 7^/^^^ and choose a < 1 
sufficiently close to 1 such that 7(a) < 70 and 16(e log(l/a))^/^^^ < 1/4. It follows that, for n sufficiently 
large, 

P( sup |Z,+i(5)-Z,(8)| >2a" ) <ll-4-". 
\se[o,i] J 

Increasing a < 1 and C ensures that (24) holds with 6 = 1/4 for all n > 1. 

The functions at the four children of the root, Z^, . . . , are distributed as Z^-i, they also converge 
uniformly to continuous limits denoted Z'^^^ . . . , Z*^^^. The random functions Z'^^^ . . . , Z*^^^ are indepen- 
dent and distributed as Z. Equation (21) and independence imply 



Z{s) =^,^u} [{UVfZ^^^ (^) + {U{1 - V)fZ^^) (^) 



+ 1 



{s>U} 



U 



((1 - u)vfz^'^ + ((1 - U){1 - V)fZ^'^ {\ 



s-U 



u 



almost surely, considered as random continuous paths. In particular, the distribution of Z solves the distri- 
butional fixed-point equation (11). 

Finally, we look at the moments of II Zn 1 1 = sup5^[o,i] l^n(<5)| and ||Z|| = supg^[o,i] l^l-^)!- 

Proposition 11. For every p > 1, we have E[||Z||^] < 00 and \\Zn — Z|| ^ in L^. 

Proof. Let A{x) = P (||Zn+i - Z^|| > x) and a < 1, C > such that (24) is satisfied with b = 1/4. 
Then, by the upper bound (25), we have 



E[||Z,+i-Z,||]= / An{x)dx = 

Jo Jo 



An{x)dx = / An{x)dx -\- / A{x)dx. 



,1-/3 



The first summand is at most a^, the second one at most 2^ • 4 by (24). Altogether, there exists 
i?>OandO<g<l with 

E[||Z,+i-Z,||] <i?^- 

for all n. By the same argument, one easily shows that the p-th moment of || Z^+i — Z^ || is also exponentially 
small in n for any p > 1. Then, using Minkowski's inequality 



E[||Z,|n^/^ = E 



< E 



i/p 



Y,{Zk-Zk-i)^h 

k>0 
^ n 

Y,\\Zu-Zu-,\\ + \\h 

\ k>0 

<^E[iiz,-z,_iin^/^ 



E 



i/p 



k>0 



which is uniformly bounded in n. So Z^ is a martingale bounded all L^, it follows that E [||Z||^] < 00 for 
all p, and that E[||Zn -Z||^] ^ as n ^ 00. □ 
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5 Uniform convergence of the mean 

The proof that Assumption (Al) holds for Proposition 7 requires to show convergence of the first moment 
n~^E [Cn(5)] towards i^i{s) — K\(^s(\ — s))^^'^ uniformly on [0, 1]. Note that, since Cn{s) is continuous 
at any fixed s G [0, 1] almost surely, the function s ^ E is continuous for any n. Curien and Joseph 

[6] only show point- wise convergence, and proving uniform convergence requires a good deal of additional 
arguments. 

Proposition 12 (Uniform convergence). There exists e > such that 

sup |n-%[Cn(5)]-/ii(5)|=0(n-^). 

sG[0,l] 

In other words, n~^E[Cn(s)] converges uniformly to fii on [0, 1] with polynomial rate. 

We first prove a poissonized version, the routine Tauberian arguments yielding Proposition 12 are pre- 
sented in Section 5.3. Consider a Poisson point process with unit intensity on [0, 1]^ x [0, oo). The first two 
coordinates represent the location inside the unit square; the third one represents the time of arrival of the 
point. Let Pt{s) denote the partial match cost for a query at x = s in the quad tree built from the points 
arrived by time t. 

Proposition 13. There exists e > such that 

sup \t-^nPt{s)]-^,{s)\=0{t-n. 

se[o,i] 

The proof of Proposition 13 relies crucially on two main ingredients: first, a strengthening of the ar- 
guments developed by Curien and Joseph [6], and the speed of convergence E [Cn(0] ^ E ^ 
uniform query line ^, see (2), by Chern and Hwang [4]. By symmetry, we write for any (5 G (0, 1/2) 

sup \t-^E[Pt{s)]-/ii{s)\= sup \t-^E[Pt{s)]-^i{s)\ 

se[0,l] sG[0,l/2] 

<sup|t-^E[P,(8)]-/ii(5)|+ sup |t-^E[P,(5)]-/ii(5)|. (29) 

s<6 sG(5,l/2] 

The two terms in the right hand side above are controlled by the following lemmas. Their proofs are 
presented in the following two subsections. 

Lemma 14 (Behavior on the edge). We have 

sup\t-^E[Pt{s)]- ^i{s)\ < 2^ sup r-^E[Pr{6)]^Ki5^^^. (30) 

s<S r>t/2 

Lemma 15 (Behavior away from the edge). There exist constants Ci , C2, with < r] < f3 and 7 G (0, 1) 
such that, for any integer k, and real number 5 G (0, 1/2) we have, for any real number t > 0, 

sup |t-^E[P,(5)] - /ii(5)| < C^3-\l - -if + C2k2\p - r])-^H-\ 

sG[5,l/2] 

Before proceeding with the proofs of the lemmas, we indicate how they imply Proposition 13. By 
Lemmas 14 and 15, we have for any 5 G (0, 1/2) and natural number /c > 

sup \t-^'E[Pt{s)]- iii{s)\ <3i^l(5^/^+3Cl(5-^(l-7)^ + 5C2H-^2^(/3-7?)-2^ 

sG[0,l] 

Choosing S = and k = [a log t\ for i^, a > to be determined, we obtain 

sup \t-^E[Pt{s)] - /ii(s)| < 3i^it-"^/' + 3Cit"(l - 7)«i°g^-i + 5C2t-^[2/(/3 - ry) ^^^^'a log t. 

sG[0,l] 

First pick a > small enough that 

This a being fixed, choose u > small enough that u -\- a log(l — 7) < 0. The claim follows. 
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5.1 Behavior along the edge: proof of Lemma 14 

To deal with the term involving the values of 5 G [0, 6], we relate the value E[Pt(s)] to E[P^((5)]. The term 
E[Pt(5)] will then be shown to be small by choice of S small. 

The limit first moment = limt^oo E[Pt(5)] is monotonic for s G [0, 1/2]. It seems, at least 

intuitively, that for any fixed real number t > 0, E[Pt (5)] should also be monotonic for s G [0, 1/2], but we 
were unable to prove it. The following weaker version will be sufficient for our purpose. 

Proposition 16 (Almost monotonicity). For any s < 1/2 and e e [0^1 — 2s) , we have 



E[Pt{s)] < E 



The idea underlying Proposition 16 requires to understand what happens to the quad tree upon consid- 
ering a larger point set. For a finite point set V C [a, b] x [0, 1] x [0, 00), we let V{V) and H{V) denote, 
respectively, the set of vertical and horizontal line segments of the quad tree built from V. 

Lemma 17. LetV = {pi, . . . be a set of points with Pi = (xi^yi^ti) G [a2, ^3] x [0, 1] x [0, 00) ordered 
by their t coordinate, i.e. ti < t^+i. Additionally we assume V to be in general position, meaning that all 
x-coordinates are pairwise different and the same holds true for the y and t coordinates. Furthermore let 
2 = {p'l^ ' • ' 5-Pm} ^ [(^11(^2] X [0, 1] X [0, 00) with p[ = (x t-) again ordered according to their third 
coordinate such that P U Q C [ai, as] x [0, 1] x [0, 00) is again in general position. Then we have 

H{V U Q) D H{V) and V{V U Q) C V(P). 

Proof. We assume for a contradiction that the assertion is wrong and focus on the case that H{V) (f. 
H{V U Q); the other case is handled analogously. Let zi be the index of the "first" point in P such that 
the horizontal line of pi^ is shorter (at least on the right or left side of the point) in the quadtree built from 
P U Q than it is in the one built from P. Here, first refers to the time coordinate t. Now, by construction 
there must be an index 22 such that the vertical line of pi^ blocks the horizontal line of in P U Q but not 
in P. We again choose 12 such that ti^ is minimal with this property; by construction ti^ < ti^. Repeating 
the argument gives the existence of an index is and a point pi^ whose horizontal line blocks the vertical line 
of Pi^ in P but not in P U Q with ti^ < ti^ . This obviously contradicts the choice of zi . □ 

Proof of Proposition 16. Consider the unit square [0, 1]^ and the extented box [— e, 1] x [0, 1], and a single 
Poisson point process on [— e, 1] x [0, 1] x [0, t] with unit intensity. Write Pt{s) for the number of (horizon- 
tal) lines intersecting {x = 5} in the quad tree formed by the all the points. Similarly, let P^ (5) = P^{s) be 
the corresponding quantity when the quad tree is formed using only the points falling inside [0, 1]^. Then, 
for this coupling, we have by Lemma 17, 



Pt{s) < PUs) = — 

Taking expectations completes the proof. □ 

Proof of Lemma 14. We use Proposition 16 to relate E[P^(s)] to E[Pt/(5)] for some t' . Choosing e = 
{5 - s)/{l - 5) yields f = t{l - s)/{l - 6) < t{l - 5)-^. Thus, for any 5 G (0, 1/2) and t > we have 

sup \t-^E[Pt{s)] - ^i{s)\ < supt-^E[Pt{s)] + /ii((5) 

s<6 s<6 

<snpt-^E[Pt^{S)]^fii{6) 

s<S 

<t-^E[P,/(i_5)((5)]+/ii((5) 

< (1 - S)-^ sup r-^E[Pr{S)] + /ii(^). 

r>t/2 

This completes the proof since S < ^ and /ii(s) < KiS^^'^. □ 
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5.2 Behavior away from the edge: proof of Lemma 15 

The core of the work is to bound the second term in (29) involving s G ((5, 1/2]. We prove that E[P^(s)] is 
uniformly Cauchy on ((5, 1/2] by tightening some of the arguments in [6]. We could start from (14) there, 
but we feel that the reader would follow more easily if we re-explain the approach. Observe that most of 
the quantities defined in the remaining of the section will depend on s which we will neglect in the notation 
for the sake of readability. 

The first step is to unfold k levels of the fundamental recurrence (10) in the Poisson case. Let ri be the 
arrival time of the first point in the Poisson process and Qi = Qi (s) be the lower of the two rectangles that 
intersect the line {x = s} after inserting the first point. Inductively let Tk = rk{s) be the arrival time of 
the first point of the process in the region Qk-i and Q/c be the lower of the two rectangles that hit the line 
{x = s} at time Tk. For convenience, set Qo = [0, 1]^. Finally, let Pt be an independent copy of the process 
Pt (set Pt = for t < 0). At level one, using the horizontal symmetry, we have 

nPt{s)] = P (t > n) + 2E[PLeb(Q,)(t-r,)(a)], 

where = ^i{s) G [0, 1] denotes the location of the line {x = s} relative to the region Qi. If the interval 
[^1, ri] denotes the projection of Qi on the first axis, we have 

6(5) 



Write = £,k{s) G [0, 1] for the location of the line {x = s} relatively to the region Q/c, and = 
Leb((5/c). Then, unfolding up to level k, we obtain 

E[Pt(s)] = 9k{t) + 2'=E[PM.(t-r.)(a)], (31) 

where < gk{t) < 2^ — 1. Next, we introduce the inter-arrival times Ck — '^k — '^k-i with := and 
their normalized versions (k = Ck^k-i (again Co •= 0). Defining = M^Tk, we can rewrite (31) as 

E[Pt(s)] = 9k{t) + 2'=E[PM.t-F.(a)]- (32) 

Note that {Ck)k>i are i.i.d. exponential random variables with unit mean, also independent of (^/e, Qk)k>i- 
Before going any further, note that, as we have already seen in Section 4, the region Q^, is not distributed 
like a typical rectangle at level k\ in particular Leb((5/c) is not distributed as XiYi • • • Y^, for independent 
[0, 1] -uniform random variables X^, F^, z > 1. Intuitively, Qk should be stochastically larger than a typical 
cell, since it is conditioned to intersect the line {x = s}. This is verified by the following lemma. 

Lemma 18. For any s G (0,1), any integer k > 0, and 1 < i < 2^, we have 

Leb(Qfc) = Mk >st XiYi • • • X^^, 
where Xi^Yi, i > 1 are independent random variables uniform on [0, 1]. 

Proof. Consider one split, at a point (X, F) uniform inside the unit square. The split creates four new 
boxes, two of them being hit by s. Let L be the length these two cells. Their height is either F or (1 — F), 
which are both uniform. So it suffices to prove that L >st X. By symmetry, it suffices to consider s < 1/2. 
We have, 

L = l{s<x}X + l|s>x}(l - X). 
Write FL{y) = P {L <y) and Fx{y) = P {X < y) = y. It is then easy to see that 

0, y<s 
FL{y) = P{L<y)= { y-s, s<y<l-s 

27/ -1, y>l-s. 

Hence, for all s G (0, 1/2) and all y G (0, 1) we have FL^y) < y = Fx{y)- The result follows. □ 

The second term will be treated using results for the case s = ^, for a uniform random variable ^ 
independent of everything else. Curien and Joseph [6] found a very clever way to circumvent the problem 
that for any k > 1, the random variable is not uniformly distributed on [0, 1]. In their Proposition 
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4.1 they introduce a version of the homogeneous Markov chain (^/c, A1/c)fe>i where Mk -= M^/Mk-i 
together with a random time T such that for any /c G N, conditionally on {T < A:}, the random variable 
^/e is uniformly distributed on [0, 1], independent of (Ali, . . . , 7W/c, T). Choosing these random variables 
independent of the process Pt we will use them in the following without changing the notation (F^ can be 
constructed using {Mi)i<i<k and an additional set of i.i.d. exponential random variables with mean one). 
The details of the definition of T are not important for us. The only crucial thing is that T has exponential 
tails. Indeed, we have [p. 15 of 6] 

E[1.15^] < C^{s A (1 - s))-^/^ < C4(5"^/^ (33) 

for some constant C4 in the present case, 6 < s <l/2. 

Then, using (32) and the triangle inequality, we obtain for any t and r such that r > t, 

\t-^^[Pt{s)]-r-P^[Pr{s)] \ <2'^\t-^^[pM,t-FM)\-r-^nPM,r-F,{ik)]\+2''+^r-P 

<2''^\t-^¥.[PM,t-FAik)l{T<k}] - r-P¥.[PM,r-F,{^k)l{T<k}]\ 
+ 2^ \t-^nPM,t-F, (a)l{T>fe}] - T-^W'M.r-F, (a)l{T>fc}] | 

+ 2'=+^-'^. (34) 

To complete the proof of Lemma 15, we now devise explicit bounds for the two main terms in (34) when 
we can ensure that coupling occured by level k (i.e., T < k)oi not. 

i. No coupling by level k, T > k. In this case, we bound the terms roughly. We obtain 

t-^E[PM,t-FA^k)l{T>k}] - r-^nPM,r-FAik)l{T>k}] 
<2''+^SUT,U-^-E[PM,u-F,i^k)l{T>k}]- 

U>t 

One then essentially uses the uniform bound sup^ sup^ u~^'E[Pu{s)] < C5 (see (10) in [6]) Holder's and 
Markov's inequalities to leverage a bound that makes profit of the exponential tails of T. The details are 
found in [6], p. 16. For any > and s G (5, 1/2], one has 

ik-i)/p /-^.n 1 kT] \ 

U 



2k 



(/?p+l)(/?p + 2) 



by the upper bound in (33). Choosing p close enough to one that the term in the brackets above is strictly 
less than one, we obtain for any s G [5,1/2] and real numbers r > 0, 

2''\t-PnPM,t-FM)MT>k}]-r-^nPM,r-F,{£,k)l{T>k}\\ < 2C4C5(5-l/2-l/(2p) (1 _ 

<Ci(5-i(l-7)\ (35) 

where Ci denotes a constant and 7 > (and p > 1 is now fixed). 

//. Coupling has occurred before level k,T <k. In this case, we need to be a little more careful and match 
some terms. In what follows, we write x+ = x V 0. We start with 

t-''2'=E[PM,*-F,(a)l{T<fc}] = 2'=E[l{x<fc}(Mfe - t-iFfc)^^(Mfct - Ffc)], 

where 0{x) = x^^'E[Px{X)] with X a [0, l]-uniform random variable independent of everything else. The 
estimate in (2) is easily transferred to the poissonized version, and we have 0{x) = k, -\- 0{x~^) for any 
< T] < (3. Therefore 

2''\t-^-E[PM,t-F,{ik)-i-{T<k}] - r-^E[PM,r-FAik)l{T<k}]\ 

< 2'=|E[l{x<fc}(Mfe - t-'Fkf+0iMkt - Fk)] - E[l{x<fc}(Mfc - r-^Fkf+OiMkr - Fk)]\ 

< 2'=E [ {Mk - t-^Fkf+e{Mkt - Fk) - {Mk - r-^Fkf+e{Mkr - Fk)] . (36) 
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Fix T] < p. For X > 0, we have, as x ^ oo 

[Mk - x-^Fk)l ■ 0{MkX - Fk) = Mf (1 - 0{x-^FkM-^)){K + OiMpx-"^)) 

since Mj. G (0, 1) and r] < /3, the O(-) terms being deterministic and uniform in 5 G [0, 1]. Going back to 
(36), the terms tvM^ coming from the two terms with t and r cancel out, and there exist constants C7, Cg 
such that, for alH, r large enough such that moreover t < r, we have 

2''\t-^E[PM,t-FA^k)l{T<k}] - r-^E[PM..-F.(a)l{T<fe}]| 

< Cs2H-^E[FkM^-'-'^]. 

Since it will be necessary to choose k tending to infinity with r to control the term in (35), it remains to 
estimate Ei[F]^M^~^~^]. By definition of Fj^ = Mj^r^, one easily verifies that Fj^ < Yli=i Ck^ where the 
normalized inter- arrival times d were defined right after (31). Since Mi < 1 for every i, we have 

by the lower bound on in Lemma 18, X denoting a uniform on [0,1]. We finally obtain 

2^|t-%[PM.t-F.(a)l{T<ife}] -r-%[PM.r-F.(a)l{T<fc}]| < Cskm^ {p - 7])'^^ (37) 

Putting (35) and (37) together with (34) yields, for any r > such that t <r 

\t-^E[Pt{s)] - r-^E[Pr{s)] I < CiS-\l - 7)^ + Csk2^{P - r])-^H-'^ + 2^+4"^ 

< Ci8-^{1 - 7)^ + C2k2^{p - r])-^H-'^. 
for some constant C2. The statement in Lemma 15 follows readily from the triangle inequality. 

5.3 Depoissonization: proof of Proposition 12 

The depoissonization relies on standard arguments based on the concentration of Poisson random variables 
and the monotonicity of E[Cn(s)] in n for each s G (0, 1). 

We first state a standard concentration bound for Poisson distribution that we will use. It follows easily 
from the Chernoff bound and we omit the proof. 

Lemma 19. Let N be Poisson{t). Then, there exists (^0 > such that for every 5 G (0, 5q) and every t > 

V(\N -t\ > tS) < 2e-^^'/^ 

Recall that we not only need to prove n~^E[Cn{s)] uniformly for s G (0, 1), we also want 

to conserve the polynomial error rate. We first focus on the upper bound. Write En = n~^/^ and let 
N ~ Poisson(n(l + Sn)) be independent of the process building up the discrete quadtree. Then Cn{s) = 
Pn(i+£^)(^)- By monotonicity, we have 

E[Cn{s)] = E[Cn{s)l^N>n}] + E[Cn{s)l^N<n}] 
< E[CN{s)l{N>n}] + E[Cn{s)l{N<n}] 
<E[CN{s)]^E[Cn{s)l{N<n}] 

<E[CAr(s)]+nP(7V<n), 

since Cn{s) < n. For t = n(l + En) and S = €n/2, we have t{l — S) = n(l + Sn){^ — s:n/2) = 
n(l + £n/2 + o{£n)) > n, for all n large enough. It follows from Lemma 19, for all n large enough, 

E[Cn{s)] < E[C7v(5)] +ne-^(^+^-)^'/^ 
<E[C^(.)]+e--^'^A^ 
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Therefore, for any s € [0, 1], 

n-^E[Cn{s)] - < n-^E[CN{s)] - Mi( 



s) + n ' e 



= (1 + s„f[n{l + SnT^nCNis)] - + n-''e-"'''/i2 (38) 

< [n(l + SnT^nCNis)] - ^l{s) + £„[n(l + SnT^nC'Nis)] + 71-^6-"'^'/'^ 

Similarly, we can obtain a lower bound using N' ~ Poisson(n(l — £„)), again independent of the discrete 
process. We obtain 

E[C„(s)] = E[C„(s)l{jv'>n}] +E[C„(s)l{yv'<„}] 
>E[CiV'(s)l{jv'>„}] 

= E[Cn'{s)] - -E[CN'{s)l{N'<n}] 

> E[CAr/(s)] -nP {N' < n). 
We again aim at using Lemma 19. Set t = n(l — e„) and S = e„. Then t{l + 5) = n(l — < n so that 

E[C„(s)] > E[Cw,(s)] - ne-"(i-^")^"/3 

>E[C,V'(s)]-e-"'''/^^ 
for all n large enough. It follows that, for any s G [0, 1], 

n-^E[C„is)] - > (1 - e„)[n(l - e„)]-''E[CiV'(s)] " " n-^e-^"'"/'' 

> [n(l - e„)]-^nCN'{s)] - /xi(s) - n-''e-"'''/i2^ (39) 

Finally, using Cn{s) = Pn{i-\-eri){^)^^N'{s) = Pn(i-£^)('5)' putting (38) and (39) together, and using 
Proposition 13, we obtain 

sup \n-^E[Cn{s)]-^r{s)\=0{n-'), 

sG[0,l] 

where e is given in Proposition 13. Hence the proof of Proposition 12 is complete. 

6 Moments and supremum 

Our main result implies the convergence of the second moment of the discrete towards that of the limit 
process. This section is devoted to identifying this limit, in particular it provides an explicit expression for 
the limit variance. 

Proposition 20. Let Z{s) be the process constructed in Section 4 with mean h{s). Then there exists a 
non-negative random variable Z with unit mean such that 

= E [Z^] 

satisfies the recursion 

Cm = 7 .u"^^^ So A E f + ^' - ^) + l)CiCm-i. (40) 

(m - 1) (m + 1 - ^fJm) \^ J 
for m > 2 and for any s G [0, 1] 

Z{s)^Z-{s{l-s)f/\ (41) 

In particular, 



Var{Z{s)) = K2h'{s) 
and for a query line ^ uniformly distributed on [0, 1], and independent of Z 



h\s), (42) 



2(2/3 , , , , ,^^2 (^(P . . P 



Var(Z(0) = /f3:=^^f3^(B(/3+l,/3+l))^- (^B(^| + l,| + l) ) . (43) 



2 



20 



Proof. The definition of the process Z{s) imphes that the second moment /i2(<5) = satisfies an 

integral equation. We have 



/i2(5) =E [Z{sf] =2E 



+ 2E 



T[(i-x)vf»z(i^ 



+ 2E 



+ 2E j\i - xf^[Y{i - yyfz^^) (^1^) (^^^ dx 

[r-] { [ . (^) + /V - ^r^ • (H) 

+ 2E [[y(l - Y)f] • I j^' x^/'/i (^)' + J\^- xf^h (}^) ' . 

It now follows that /xg satisfies the following integral equation 

2 



M2(s) 



2/3 + 



One easily verifies that the function / given by f{s) = C2h?{s) solves the above equation when C2 satisfies 



C2 



C2 +2 



(2/3 + l)(/3 + l) 
We obtain, after the simplification using 0^ = 2 — 3/3, 

C2 = 2B (^ + 1,^ + 1) 



B(/3+l,/j + l) 
/3 + 1 



2/3 + 1 
3(1 - /3) 



(44) 



It now suffices to prove that the integral equation that jit2 satisfies admits a unique solution. To this aim, 
we show that the map K defined below is a contraction for the supremum norm: 



Kf{s) 



-{/.-/(i).. + /^(l-.)-/(l^)..} 
h\s) 



2/3 + 

+ 2B {13 + 1,13+1) 



13 + 1' 



(45) 



For any two functions / and g, measurable and bounded on [0, 1], we have 

\\Kf-Kg\\^ 



sup 



< 



2/3 + 1 se[o,i] 
2 / 



2^ + 1 \se[o,i] 
4 



(2^ + 1)2 



sup 
e[o,i 

11/-^ 



{ f'x'^dx}^ sup \ [\l-xf^dx\]\\f-g\\ 
Us ) se[o,i] Uo J / 



1 - s 
1-x 



1-s 
1-x 



dx 



Since 2/3 + 1 > 2, the operator K isa. contraction on the set of measurable and bounded functions on [0,1] 
equipped with the supremum norm. Banach fixed point theorem then ensures that the fixed point is unique, 
which shows that indeed 

E[Z{sf]=C2h\s). 

Then, K2 = C2 — I and by integration 

Var(Z(C)) = C2B{/3 + 1,P + 1)-(b(^ + 1,^ + 1 
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Analogously one shows that the m-th moment of Z{s) is of the form Cmh^{s) where Cm solves (8). 
The Lipschitz constant of the corresponding operator in (45) is 4/ {/3m + 1)^, hence again smaller than one. 
This immediately implies that {cm)m>i are the moments of Z{s)/h{s)^ independently of s. 

It only remains to prove that there is only one distribution with these moments. We prove that there 
exists Ai > such that 

Cm < Aim^, m > 1, (46) 

which completes the proof of the proposition by the Carleman condition [see, e.g., 14, p. 228] 

Suppose that (46) is satisfied for all m < mo. By Stirlings formula, there exists a constant A2 such that 
for all m > 1 and 1 < £ < m 



B(/3^ + l,/3(m-^) + l) < 



Ao 



m 



(m - £y 



m" 



Next, the prefactor in (8) is of order 1/m, hence bounded hy As/m for some > and all m > 1. Using 
this, the induction hypothesis and x^{l — x)^~^ < 1 for all x G [0, 1] it follows that 



< 



a2 A A ^0-1 

z;;2 ^0 ^0 



< Aim"^' 
if mo is chosen large enough. 



□ 



Concerning applications, the maximum of the process Sn = sup^^[o,i] ^n{t) is of large interest since it 
is the worst-case of a partial match query in the quadtree. The uniform convergence of n~^Cn{t) directly 
implies 

in distribution with S = sup^gjg Z{t) where Z is the process constructed in Section 4. The results 
obtained so far yield that, stochastically 



5" < 



(^{UVfs'^^^ + {U{1 - V)fs'^^^^ V (((1 - U)Vfs'^^^ + ((1 - U){1 - V)fS^^^^ , 



(47) 



where 5(i),...,S'W are independent copies of S, also independent of (/7, V) which are themselves inde- 
pendent and uniform on [0, 1]. We prove that the convergence of the supremum holds for all moments. 
Theorem 12 and Corollary 21 in [32] provide uniform integrability of 5*^ which allows us to prove the 
following Theorem: 

Theorem 21. We have E [S'^] < oo for all m and as n ^ oo, Sn ^ S in distribution, with convergence 
of all moments. 

Proof Uniform integrability implies that Sn is bounded in and hence also in L^. For higher moments, 
we proceed by induction. Let Bi such that 

for all m < mo and n > 1 with mo > 2. Furthermore, choose B2 such that E [5^°] < B2 for all n < uq. 
Then, the recurrence for Cn{t) provides 




<4^o5f +452E 
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Note that, as n ^ oo, 



E 



(n) \ /3mo' 



^1 

n 



E [{UV) 



/3mol 



{/3mo + 1)" 



thus choosing no and B2 appropriately we have E[S'^°] < B2 since tuq ^ 2. This shows that Sn is 



bounded in and the assertion follows. 



□ 



Note that the argument to extend convergence of the first two moments to moments of arbitrary order 
only involves the recursion that is satisfied by Cn . It is easily generalized to arbitrary sequences obeying 
(12) under weak additional assumptions by the proof given here. 



7 Partial match queries in random 2-d trees 
7.1 2-d trees: constructions and recursions 

The random 2-d tree was introduced by Bentley [1] and is used to store two-dimensional data just as the two- 
dimensional quadtree. It is also called two-dimensional binary search tree since it is binary and mimics the 
construction rule of binary search tree for two-dimensional data. Our aim in this section is to introduce 2-d 
trees, and extend to 2-d trees the results for partial match queries in quadtrees we obtained in the previous 
sections. All the results can be transferred (convergence as a process, convergence of all moments at one 
or multiple points, convergence of the supremum in distribution and for all moments); we will mainly state 
the forms of the theorems for 2-d trees, and focus on the points that deserve some verifications. 

Construction of 2-d trees. The data are partitioned recursively, as in quadtrees, but the splits are only 
binary; since the data is two-dimensional, one alternates between vertical and horizontal splits, depending 
on the parity of the level in the tree. More precisely, consider a point sequence Pi,P2, • • • ^ [O7 1]^- 
As we build the tree, regions are associated to each node. Initially, the root is associated with the entire 
square [0, 1]^. The first item pi is stored at the root, and splits vertically the unit square in two rectangles, 
which are associated with the two children of the root. More generally, when i points have already been 
inserted, the tree has i internal nodes, and i -\- 1 (lower level) regions associated to the external nodes and 
forming a partition of the square [0, 1]^. When point p^+i is stored in the node, say u, corresponding to the 
region it fall in, divides the region in two sub-rectangles that are associated to the two children of u, which 
become external nodes; that last partition step depends on the parity of the depth of u in the tree: odd and 
we partition horizontally, even and we partition vertically. See Figure 3. (Of course, one could start at the 
root with a horizontal split.) 




Figure 3: An example of a 2-d tree is shown: on the left, the partition of [0, 1]^ induced by the points; on 
the right, the corresponding binary tree. In red, the nodes visited when performing the partial match query 
at materialized by the dashed vertical line. 



Partial match queries. From now on, we assume that data consists of a set of independent random 
points, uniformly distributed on the unit square. Unlike in the case of quadtrees, the direction of a partial 
match query line with respect to the direction of the root does matter. Let {t) and {t) denote the 
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number of nodes visited by a partial match for a query at position t e [0,1] when the directions of the spHt 
at the root and the query are parallel and perpendicular, respectively. Subsequently, we will analyze both 
quantities synchronously as far as possible. We will always consider directions with respect to the query 
line, and although some of the expressions (for the sizes of the regions for instance) will be symmetric, we 
keep them distinct for the sake of clarity. (We also assume without loss of generality that the query line is 
always vertical, and that the direction of the cut at the root may change.) 

As in a quadtree, a node is visited by a partial match query if and only if it is inserted in a subregion 
that intersects the query line. Unfortunately, these nodes are not easily identifiable after the insertion of 
n points; the value of the quantity C^{s) is obtained by adding twice the number of lines intersecting the 
query line at s and the number of boxes that are intersected by the query line and will have their next split 
perpendicular to the query line (that is, the depth of the corresponding external nodes in the tree have odd 
parity). 

Recursive decompositions. Let {U, V) be the first point which partitions the unit square. By construc- 
tion, since the directions of the partitioning lines alternate, both processes {t) and (t) are coupled: 
when the query line is perpendicular to the split direction, the recursive search occur in both child sub- 
regions whose sizes we denote by Nn and Sn, and we have 

C^{s) ^ 1 + C^jifis) + Ci=f\s); (48) 

when the query line and the first split at the root are parallel, only one of the sub-regions (of sizes Ln and 
Rn) is recursively visited and we have 

C^is) ^ 1 + li.<uAZ^'' (^) + Hs>u}C'nf {^) ■ (49) 

Here (Ci~'^^)n>o, (C'i~'^^)n>o are independent copies of (C^)n>o, independent of (A/'^, Sn) in (48) and 
(Cn^'^^)n>o, (C''^^'^^)n>o are independent copies of {C:^)n>o, independent of (L^, Rn) in (49). Moreover, 
here and in the following distributional recurrences and fixed-point equations involving a parameter s G 
[0, 1] are to be understood on the level of cadlag or continuous functions unless stated otherwise. 

As in the case of partial match in random quadtrees, the expected value at a random uniform query line 
^, independent of the tree is of order for the same constant (3 defined in (1), and we have 

for some constants k,^ > 0^k,± > 0. This was first proved by Flajolet and Puech [17]. A more detailed 
analysis by Chern and Hwang [5] shows that 

E[C,|«)l=».»*-3 + 0(,/-'). = . EM. 

Observe that hz= = ^13(3 — 5/3)hz and hz± = 13(2/3 — where is the leading constant for E[Cn(0] 
the case of quadtrees defined in (1); both hz= and hz± are larger than hz. 

HOMEGENEOUS RECURSIVE RELATIONS AND LIMIT BEHAVIOUR. For our purposes, and although it 
yields more complex expressions, it is more convenient to expand the recursion one more level to obtain 
recursive relations that only involve quantities of the same type, only (C^)n>o or only (C^)n>o' each one 
of the first two sub-region at the root is eventually split, and this gives rise two a partition into four regions 
at level two of the tree. Let (Ui^Vi) and ([/^, F^) be respectively the first points on each side (left and right) 
of the first cut, when it is parallel to the query line. Let also {Uu^Vu) and (/7(i, V^i) be the first points on each 
side of the cut (up and down) when it is perpendicular to the query line. Note that /7, V^, Vr are independent 
and uniform on [0, 1], and so are Uu and Ud- 

Let . . . , /£^4, and . . . , denote the number of data points falling in these regions when 
the root and the query line are parallel and perpendicular respectively. The distributions of , . . . , on 
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the one hand, and /^^|, . . . , on the other hand are sHghtly more involved than in the case of quadtrees. 
One has e.g. given the values of [/, V^, Vr it holds 

it} = Bin((Bin(n -l;U)- 1)+, Ve) 

and given V,Ud,Uu 

1^1 ^ Bin((Bin(n - 1; - 1)+, Ud) 

where the inner and outer binomials are independent. Analogous expressions hold true for the remaining 
quantities. 

Substituting (48) and (49) into each other gives 



r(n) 



+ l{s>f/} 



wo,+o;j,"(i)+cg'(i) 



1 r . ^ r^='^) 



s-U 
1-U 



(=,4) 



S-U 
1 - U 



(52) 



and 



Cn (s) = 1 + 1{S„>0} + l{Af„>0} + l{s<t/d}C^i'/^ 



(±,3) / g - 



s 



(53) 



where {Cn '*^)n>o, ^ = 1, ... ,4, are independent copies of (C~)n>o, which are also independent of 
the family {U^ lt\^ I^l^ 1^1^ lt\) in (52), and (Ci^'*^)n>o. ^ = 1, ... ,4, are independent copies of 
{C^)n>o, which are also independent of {Ud, Uu, ^±^2' ^±^3' ^a^a) ^^3). Asymptotically, any limit 
Z^{s) of n~^C^{s) should satisfy the following fixed-point equation 



Z=is) ^l{.<^j [iUVerz^=''^ (^) + ([/(I - y,))''z(=.2) (^) 



(54) 



4s>i7} 



((1 - U)VrrZ(='^^ f ^— ^ j + ((1 -U){1- Vr)) 



)/3^(=,4) 



?7 



1 - U 



where i = 1, . . . , 4, are independent copies of Z~, independent of {U, Vg, Vr). Likewise any limit 

of n~^C^{s) should satisfy 



^Ms>uMi^-Ud)vrz^^^'^ 

+ 1^.>^^^((1-/7.)(1-F))/^Z(^'^) 



(55) 



where i = 1, . . . , 4, are independent copies of Z^, independent of {Ud^UmV). Moreover, accord- 

ing to (48) and (49), we expect a connection between these two limits. This will be stated in the first result 
of the next section and always allows us to focus on C^{s) first. Result for can then be deduced easily 
afterwards. 



7.2 About the conditions to use the contraction argument 

Existence of continuous limit processes. As in the case of quadtrees, one of the first steps consists 
in showing the existence of the limit processes and Z^. 

Proposition 22. There exist two random continuous processes Z^^Z^ with E[Z^(5)] = E[Z^(5)] = 
h{s), finite absolute moments of all orders such that Z^ satisfies (54) and Z^ satisfies (55). The laws of 
Z^ and Z^ are both unique under these constraints. Additionally, 
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/3 + 1 

and 



Z^{s) = V^Z'^='^\s) + (1 - VfZ^='^\s) (56) 



1-U 

for every fixed s G [0,1], Z^{s) is distributed like Z{s) where Z is the process constructed in 
Section 4. In particular, Var (Z=(s)) is given in (42) and Var (Z^(s)) = if^/i^(s), where 



^^'=l,2^ril^j +2B(/? + l,/^ + l)(^^J -IJ, (57) 
J C2 defined in (44). 

• //"^ is uniform on [0, 1] and independent of ^ Z^, then Var (Z^(^)) g/v^^ Z?}^ (43) and 
Var = Kt = (^^1^ + 2B (/3 + 1, /? + 1)) ' B(/3 + 1, /? + 1) 

Pr(9(9/ The fixed-point equation (54) is very similar to that in (1 1), and we use the approach that has proved 
fruitful in Section 4. More precisely, the construction of Z{s) slightly modified to Z^{s). Define the 
operator G= : [0, 1]^ x C[0, 1]"^ ^ C[0, 1] by 

G=(x, A, /2, /3, A) W =!{.<.} \{xvfh (-) + (^(1 - V)fh (-) 

+ l{s>x} 



((1 - X)zfh \—^) + - - 

Then let (as in Section 4) 

C4"i = G=(C/„, K, Z='«\ Z='"2, Z=>«3, Z='«4), Zo='" = h{s\ 

for all u ^ T, where {/7^,v G T},{K,^ ^ T} and G T} are three independent families of 

i.i.d. [0, l]-uniform random variables. Lemma 10 remains true for Z^ := Z^'^ since equals Wn 
in distribution where Wn appears in (27). Since also and Ln (appearing in Lemma 28) coincide in 
distribution, (24) holds true for Z^ and therefore Proposition 9 remains valid. The existence of all moments 
of sup5^[o,i] ^^(^) follows in the same way. Finally, note that Z^{s) is distributed as Zn{s) for all fixed 
n, 5, hence the one-dimensional distributions of Z^ and Z coincide. It is now easy to see that defined 
by (56) solves (55). The uniqueness of Z^{s) (resp. Z^{s)) follows by contraction with respect to the (2 
metric, compare Lemma 18 in [32]. Finally, the variance of Z^{s) can be computed as in Section 6 but it 
is much easier to use (56), we omit the calculations. □ 

Uniform convergence of the mean. Comparing construction and recurrence for partial match queries 
in 2-d trees and quadtrees it seems very likely that this quantities are not only of the same asymptotic order 
in the case of a uniform query but also closely related for fixed s G [0,1] and n G N. This can be formalized 
by the following Lemma 

Lemma 23. For any s G [0,1] and n we have 

iE[C„(s)] < nC=is)] < 2E[C„(s)]. 
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Proof. We prove both bounds by induction on n using the recursive decompositions (10), (52). Both in- 
equaUties are obviously true for n = 0, 1. Assume that the assertions were true for all m < n — 1 and 
s G [0, 1]. We start with the upper bound which is easier. By (52), we have 



E[C=(.)]<2 + E 
+ E 

Hence, it suffices to show that 



^{s<U} 



^{s>U} 



c 



(=,3) 
■(") 



s-U 
l-U 



E 



^{s<U} 



< 2E 



^{s<U} 



in) 

This can be done in two steps. First, by conditioning on ( and /7, using the induction hypothesis, we 
have 



s-U 



u 



E 



< 2E 



C 



(1) (l\ 



in) in) 

Finally, conditioning on /7, ( is stochastically smaller than I{ ' which gives 



E 



.(1) IL\ 



< 2E 



l{s<c/}C' 



.(1) 



by monotonicity of n ^ E [Cn(s)]. For the lower bound, note that 



E[C=(.)]>1 + E 
+ E 

Therefore, it is enough to prove 



^{s<U} 



L{s>t/} 



C 



-in,) 



E 



^{s<U} 



> 



E 



s-U 
l-U 



C 



(") 



S-U 

1 - u 



This can be done as for the upper bound. First, by the induction hypothesis, we have 



E 



'^{s<U}C 



iL:l Ku) 



>iE 

- 5 



'^{s<U}C 



The result follows as for the upper bound by the fact that is stochastically larger than — 1)^ and 



iL:\ \u) 



r(n) 



Recalling (50) and (51), it is natural to introduce the constants 



□ 



with = Y^Kr, (59) 



B(f + l,f + l)' 

and the functions /j^Y {s) = KYh{s), and /if (s) = K^h{s). 
Proposition 24. There exists > such that 

sup |n-%[C=(.)]-/irWI=0(n--), 

sG[0,l] 

and the analogous result holds true for E[C^(5)]. 

We proceed as in Section 5 by considering the continuous-time process Pf{s). Since we have already 
proved an analogous result for the case of quadtree, we give a brief sketch that focuses on the few locations 
where the arguments have to be modified. 
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Sketch of proof. The first step is to prove point- wise convergence which is done as Curien and Joseph [6]. 
By Lemma 23, using a Poisson(t) number of points, we have 



\nPt{s)] < E[PrW] < 2E[P,(5)]. (60) 

Let be the arrival time of the first point which yields a partitioning line that intersects the query line 
{x = 5}, and let Qf = QT{s) be the lower of the two rectangles created by this cut (for the expected value 
we are about to compute, they both look the same). Let := (5) be the relative position of the query 
line s within the rectangle Qf and = Leb((5f ). Then, denoting r the arrival time of the first point in 
the process, we have 

E[Pr(s)] = P (t > r) + P (t > Tf) + 2E[P^=,_,= (er)], 

where {P^{t))t>o denotes an independent copy of {P^{t))t>o and P^{t) = for t < 0. Similarly, let 
be the arrival time of the first point which cuts Q^_i perpendicularly to the query line. Let be the 
lower of the two rectangles created by this cut, and let be the position of the query line s relative to the 
rectangle Q^. With this notation and = Leb((5^), we have 

nPfis)] = Quit) + 2^E[P^=,_=(4=)], 

where < g^{t) < 2^+^ 

We need to modify the inter-arrival times — ~ ^k-i- ^P^^^ the time it takes for 

the first vertical point to fall in Q^_i which we denote by CV^ remaining time by CV^- Letting 

= Leb((5^), the normalized versions of the inter- arrival times with unit mean are 

Write Mk = Mk/M^^i. Observe that, given M^, . . . , M^, the random variable = • is not 
independent of {£,i)o<£<k, a property which is used in [6] and in the proof of Lemma 15 in the present paper. 
However we can use the trivial lower bound < and the upper bound obtained by bounding C'^ from 
above by C^'^/M^_^. Then, using almost sure monotonicity of Pt{s) (in t) and (60) to transform bounds 
for the mean in the quadtree to bounds in the 2-d tree (and vice versa), it is easy to see that the techniques 
of Section 4 in [6] work equally well in this case. The limit /if (s) is identified as in Section 5 of [6] since 
both limits satisfy the same fixed-point equation. 

The generalization to uniform convergence with polynomial rate can be worked out as in Section 5 (of 
the present document) using the modifications we have described above. The constants appearing in the 
course of Section 5 need to be modified, but e= may be chosen to equal the value fo e in Proposition 13. 
The depoissonization of Section 5.3 goes through without any modification. 

Finally, we indicate how to proceed with E[C^(s)]. The arguments above can be used to treat prove 
uniform convergence of n~^'E[C:^{s)] on [0, 1]; we present a direct approach relying on (48). We have 



= n-^ + 2 / ^{^={s)^0{k-'=))^P{Bm{n-l,v) = k)dv 



= n-^ + 2f^T{s) ■ + 0{n-^E[Bm{n - 1, V f-'=]) 

= ^ii{s) + 0{n-'=), 

uniformly in 5 G [0, 1] using Minkowski's inequality, the concentration for binomial in (18), and (59) for 
the first term and Jensen's inequality for the second. □ 
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7.3 The limiting behaviour in 2-d trees 

We are finally ready to state the version of our main result for 2-d trees. It is proved along the same lines 
we used for the case of quadtrees, and we omit the details. 

Theorem 25. With the processes and of Proposition 22 we have 

/sG[0,l] V^l^ /sG[0,l] 

in distribution in P[0, 1] endowed with the Skorokhod topology. Here and are defined in (59). For 

s e [0, 1] 

n-^E[C={s)] ^ Kr[s{l - s)f^\ n-^P\sv{C={s)) ^ {KT?KM^ " s)f , 

and 

where K2 is given in (42) and K2 in (57). 

If(^ is uniformly distributed on [0, 1], independent of {C^)n>Oi {C^n)n>o (^nd Z^, Z^, then 

with convergence of the first two moments in both cases. In particular 

Var (C=(e)) ~ KXn^^ Var {C^{S)) ~ Kin^^ 

where 

= {K^fK^ ^ 0.69848, = {K^fK^ ^ 0.77754, 

with Ks in (43) and in (58). 

Note that since Z^{s) equals Z{s) in distribution for fixed s G [0, 1] we can characterize Z^{s) as in 
(41). (56) together with Proposition 22 implies that for fixed s G [0, 1] 

Z^{s)^Z^-{s(l-s)f^\ 

with 

Z^ = ^^{V^Z^(l-VfZ'), 

where Z' is an independent copy of Z, Z being defined in Proposition 20 and V is independent of (Z, Z'). 
In particular, we have 

E[(Z^r] = (^^^"^(^^^B(/?£+l,/?(m-€) + l)QC™_,, 

for m > 2 where = E[Z^] satisfies recursion (8) and cq = ci = 1. 

Also, as in the quadtree case, it is possible to give convergence of mixed moments of arbitrary order, 
compare Theorem 6, and distributional and moment convergence of the suprema of the processes after 
rescaling as in Theorem 4. 
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A About the geometry of random quadtrees 

Lemma 26. Let Wn denote the maximum width of a cell at level n in the construction of Zn and c < 1. 
Then, 

P(Wn>c")<(4elog(l/c))". 

Proof Let C/^, i > 1 be a family of i.i.d. [0, l]-uniform random variables and Ei, i > 1, be a family of i.i.d. 
exponential 1) random variables. Then, the union bound and a large deviations argument yields 

P {Wn > c^) < 4^ • P Ui > j 

= 4--p|^fj^, <nlog(l/c)j 

< 4-exp(-n(log(l/c) - 1 - loglog(l/c))) 
<(4elog(l/c))-, 

as desired. □ 

Lemma 27. Let Fj^ be the fill-up level of a random quad tree of size k. Then, for every integer number 
X > 22 there exists an integer uq (x) with 

P (F^n <n)< n > no{x). 
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Proof. We consider the 4^ possible nodes in level n. By symmetry each of them is occupied by a key 
with the same probability. Looking at a specific one, e.g. the leftmost, it is obvious that its subtree size 
is stochastically bounded by Bin(x"^; UiVi . . .UnVn) where {Ui^i > 1} and {Vi^i > 1} are independent 
families of i.i.d. [0, 1] -uniform random variables. Then by the union bound applied to the 4^ cells at level 
n, we have 

P {F,^ < n) < 4" • P (Bin {x^; UiVi . . . UnVn) = 0) 



= 4^ .E 



UiVi . . . UnVnT^)] < 4- • E [exp {-x'^UiVi . . . UnVn)] 



< r • exp (-2") + 4-P (^UiVi . . . UnVn < 0) ^ (61) 

However, using once again the large deviations principle for sums of i.i.d. exponential random variables 

Ei,i > 1, 

P {UiVi . . . UnVn < {2/xT) = P l^^^z > nlog(x/2)^ 

f o ^log(x/2) ^ ^ log(x/2) 

< exp f -2n f ^ - 1 - log ^ 

< x-^/^o^ (62) 
for all X > 22 since then ^ log^(x/2) < Putting (61) and (62), we obtain 

P {F^r^ < n) < 4^ exp (-2^) + 4^ • x'""^^^^ < 4^+^^-^/^°^ 
for a:; > 22 and n large enough. □ 
Lemma 28. There exists < 70 < 1 such that any positive real number 7 < 70, there exists an integer 

P {Ln < 7^) < 6 • 4^7^/20\ n > 711(7). 

Pr(9(9/ The joint distribution of the x-coordinates of the vertical lines in the tree developed up to level n is 
complex. In particular, it is not that of independent uniform points on [0, 1]. However, we can use a simple 
coupling with a family of i.i.d. random points on [0, 1]^ that yields a good enough lower bound on L^. 

Let = (/7i, Vi), i > 1 be i.i.d. uniform random points on [0, 1]^. Let be the quadtree obtained by 
inserting the random points ^i, 1 < i < k, in this order. Write Di for the depth at which the point is 
inserted ; so for instance Di = 0. Let Kn be the first k for which the tree Tk is complete up to level n; we 
mean here that should have 4^ cells at level n, so it should have 4"^"^ nodes at level n — 1. Then, by 
definition {^i : i > 1^ Di < n} has the distribution of the set of points used to construct the process Z^. 
Obviously, : i > Di < n} C {(^^ : I < i < Kn} and for any real number x > 0, 

P {Ln < 7") < P j <Kn:i^j^ \U, - Uj\ < 7") 

< P <x^:i^j, \U, - Uj\ < 7") + P {Kn > x^) 

< • 27^ + P {Kn > X^) , 

by the union bound. The random variable Kn is related to the fill-up level of a random quadtree, which has 
been studied by [7] (see also [8]). We could not find a reference giving a precise tail bound, so we proved 
one here in Lemma 27. We obtain 

P {Kn > x^) = P {F^r. <n)< 4(4x-i/i^°)^, 

as long as X > 22 and n > no{x) (the condition for the bound in Lemma 27 to hold). It follows readily that 

P {Ln < 7^) < 2(^27)^ + 4(4x-i/^^^)^ 
< 6-4^7^/2°\ 

upon choosing x = [4100/201^-100/201^ ^^^^^ ^2^ ^ Ax'^^^^^) and 7 < 4 • 22-^-^^ which implies 
X > 22. This completes the proof. □ 
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